Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark set_print as unsafe #374

Closed
wants to merge 1 commit into from

Conversation

mendess
Copy link
Contributor

@mendess mendess commented Mar 8, 2023

Unfortunately set_print cannot be safely wrapped as the underlying call is not thread safe, as documented in another libbpf_ function here

As such I think it needs to be marked unsafe for this library to be considered a safe wrapper around libbpf

I also deleted the "Temporarily disable printing" example as it seem unlikely that usage pattern is realistic outside the most trivial of programs.

@mendess mendess force-pushed the mendess/set-print-unsafe branch 2 times, most recently from f524c8f to 637eb8b Compare March 8, 2023 14:27
@anakryiko
Copy link
Member

There is nothing unsfae about set_print just because it's not thread-safe, or am I missing something? Should any non-synchronized API be marked unsafe? That seems weird.

Please don't do this.

@danielocfb
Copy link
Collaborator

danielocfb commented Mar 8, 2023

Thanks for the pull request, @mendess! I don't believe I follow the argumentation in favor of making it unsafe, to be honest. The link you reference talks about thread safety. That is expressed in Rust via the Sync (and to some degree Send) marker traits.

Unsafety, in the Rust sense, refers to violation of data race freedom and or memory safety (out of bounds accesses etc.). I don't believe either is possible here. Can you please clarify why you believe this to be necessary?

Edit: Ah, sorry, mid-air collision with Andrii's post, which basically says the same thing.

@mendess
Copy link
Contributor Author

mendess commented Mar 8, 2023

Should any non-synchronized API be marked unsafe?

if it mutates a global variable then, per rust's rules, yes

This api will mutate a global variable in C, global variable which will in turn be read by almost all the functions in libbpf. This means that effectively a mutable borrow and an immutable borrow of that same global variable can exist simultaneously, which violates Rust's rules. This is the reasoning I followed to arrive at the conclusion that this had to sadly be marked unsafe.

If this doesn't make sense please tell me where I made the mistake.

@danielocfb
Copy link
Collaborator

danielocfb commented Mar 8, 2023

Should any non-synchronized API be marked unsafe?
if it mutates a global variable then, per rust's rules, yes

This api will mutate a global variable in C, global variable which will in turn be read by almost all the functions in libbpf. This means that effectively a mutable borrow and an immutable borrow of that same global variable can exist simultaneously, which violates Rust's rules. This is the reasoning I followed to arrive at the conclusion that this had to sadly be marked unsafe.

If this doesn't make sense please tell me where I made the mistake.

The point is (as discussed at length here) that the C stuff you are referring to effectively maps to Rust's AtomicUsize. And you can indeed use it safely in a global context: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=89e2b5d8074073e1f2727efa9810894c

@mendess
Copy link
Contributor Author

mendess commented Mar 8, 2023

according to the C standard, In section 5.1.2.4 "Multi-threaded executions and data races" you can read this in paragraph 25:

The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

In paragraph 4 you can get the definition of "conflict":

Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.

In paragraph 5 you can get the definition of "atomic operation", which doesn't include "word sized, aligned writes/reads".

So according to the standard it's undefined behavior to have a read and a write to the same location from different threads when no ordering is specified.

Now, I know that on x86/x64 architectures they might actually be atomic, but that only applies to the assembly generated, when you're writing C you're writing code for the C abstract machine, which has an optimizer that can reorder statements or even delete part of the code as I'm sure we all know. As such I believe you should always use true atomic operations when writing/reading to global memory locations.
If a function can invoke UB in C then it naturally follows that calling that function from rust can invoke UB in rust, as such, I decided it was best to mark it as unsafe.

@danielocfb
Copy link
Collaborator

As such I believe you should always use true atomic operations when writing/reading to global memory locations.

I agree (as I stated at least twice beforehand over in #245), which is why I advise you to bring this up with libbpf folks. From my perspective, it has no bearing on libbpf-rs, however. Yes, this is a potential violation, but this library cannot and will not vouch for current implementations details of the C implementation and if there are bugs in said C implementation, they should be fixed there and not papered over here.

@danielocfb danielocfb closed this Mar 8, 2023
@mendess
Copy link
Contributor Author

mendess commented Mar 8, 2023

this library cannot and will not vouch for current implementations details of the C implementation

But this is not an implementation detail, the library itself states that libbpf_ functions are not thread safe, thus, if we can't safely wrap them we should mark them as unsafe.

if there are bugs in said C implementation, they should be fixed there and not papered over here

I agree that libbpf bugs should not be papered over here but this isn't a libbpf bug, it's a documented part of the API. The only reason I dug around the implementation details to know why it wasn't thread safe was just to help in understanding of the reasons behind the marking as unsafe.

@danielocfb
Copy link
Collaborator

danielocfb commented Mar 8, 2023

this library cannot and will not vouch for current implementations details of the C implementation

But this is not an implementation detail, the library itself states that libbpf_ functions are not thread safe, thus, if we can't safely wrap them we should mark them as unsafe.

From #245 (comment):
So Andrii's stance seems to be that aligned pointer reads and writes are always atomic and so there is no issue. Personally, I think it should be an explicit atomic read and write, but I guess whatever. That reasoning would then presumably also apply to use_debugfs().

if there are bugs in said C implementation, they should be fixed there and not papered over here

I agree that libbpf bugs should not be papered over here but this isn't a libbpf bug, it's a documented part of the API. The only reason I dug around the implementation details to know why it wasn't thread safe was just to help in understanding of the reasons behind the marking as unsafe.

So I agree that it appears as if this comment and this one contradict what was conveyed above. And I cannot reconcile the two without tea leaf reading (though I will say that I have a hunch...). So @anakryiko can you please confirm your intentions or clarify how I misrepresented what we discussed offline?

@anakryiko
Copy link
Member

this library cannot and will not vouch for current implementations details of the C implementation

But this is not an implementation detail, the library itself states that libbpf_ functions are not thread safe, thus, if we can't safely wrap them we should mark them as unsafe.

From #245 (comment): So Andrii's stance seems to be that aligned pointer reads and writes are always atomic and so there is no issue. Personally, I think it should be an explicit atomic read and write, but I guess whatever. That reasoning would then presumably also apply to use_debugfs().

I don't mind making libbpf_set_print() using atomic exchange internally. Let's do that change. We can even say that it's thread-safe at that point, though libbpf won't provide any strict ordering guarantees between libbpf_set_print() and seeing new print callback on another thread. Either way application shouldn't rely on such tiny races for correctness.

if there are bugs in said C implementation, they should be fixed there and not papered over here

I agree that libbpf bugs should not be papered over here but this isn't a libbpf bug, it's a documented part of the API. The only reason I dug around the implementation details to know why it wasn't thread safe was just to help in understanding of the reasons behind the marking as unsafe.

So I agree that it appears seems as if this comment and this one contradict what was conveyed above. And I cannot reconcile the two without tea leaf reading (though I will say that I have a hunch...). So @anakryiko can you please confirm your intentions or clarify how I misrepresented what we discussed offline?

libbpf API not being thread-safe is a general statement about all the APIs. And that's true for most stuff (especially when we are talking about bpf_object and its methods, you definitely don't want to modify bpf_object state from two separate threads). But for stuff like libbpf_set_print(), yes, there is no synchronization, but there is also no inherent unsafety, as it's doing primitive aligned single-word reads/writes. As I said above, atomic exchange makes sense here as well, and we can even specify that this specific API is thread-safe. A bunch of feature-detection APIs can also be marked thread-safe. Please contribute patches to mark such APIs as thread-safe. As a general statement, though, all libbpf APIs are not thread-safe, unless explicitly marked otherwise, of course.

One set of APIs where libbpf-rs probably might want to enforce synchronization is libbpf_register_prog_handler and libbpf_unregister_prog_handler. Those are truly stateful and very non-atomic, and are not synchronized, so I'd prefer libbpf-rs to take some global lock for this pair of operations.

There might be few more APIs like this, though not very likely (I don't remember all of them off the top of my head, but we can audit all this). We can go audit them one by one and make a decision.

But for stuff like use_debugfs() and kernel_supports(), we don't need atomic operations. It works and will work correctly for the purposes it's being used for, even if some portion of C standard (sorry, not an expert on standards) declares some of the aspects as undefined behavior. If compilers somehow exploit such UBs, they will break, say, Linux kernel (there are lots of not explicitly atomic operations done there, see READ_ONCE/WRITE_ONCE), and they will just undo such "optimizations".

In any case, this is internal libbpf implementation details that won't lead to any crashes or misbehaviors, it's all "eventually consistent" caches. Worst case we'll repeat some feature detection on two parallel threads, which is not a bug or a problem. But this is all implementation details, and they are called from APIs that indeed are not thread-safe (as I mentioned, bpf_object manipulations are not, generally speaking, thread-safe).

But also note that it's a bit more nuanced still, as, say, bpf_object__load() can be called by multiple parallel threads provided it's called for two independent instances of struct bpf_object *.

I agree we can document all that better on libbpf side, of course. As usual, patches are very welcome.

@danielocfb
Copy link
Collaborator

Thanks Andrii. That confirms the mental model about libbpf I assumed above. So with respect to the safety/unsafety of set_print, I suggest we make libbpf_set_print use atomic instructions as mentioned. @mendess are you interested in contributing this change to libbpf? Otherwise I will put it on my plate.

Once that is done I believe we should be in agreement that there are no issues and that set_print is sound being a safe function, correct, @mendess?

@mendess
Copy link
Contributor Author

mendess commented Mar 10, 2023

Yeah, sounds good to me. I would contribute but I've personally never contributed to anything in the Linux kernel tree (yet), so it might take a bit longer to get acquainted with the mailing list workflow. Feel free to do it if you want :)

@danielocfb
Copy link
Collaborator

Yeah, sounds good to me. I would contribute but I've personally never contributed to anything in the Linux kernel tree (yet), so it might take a bit longer to get acquainted with the mailing list workflow. Feel free to do it if you want :)

It may be a good chance to get acquainted with how things are done there, so if you have entertained the thought of contributing there, I'd be happy to leave it to you. From my perspective, we are not blocked on it, so take all the time you need. Just let me know how you decide.

@anakryiko
Copy link
Member

@danielocfb I have someone on Meta side interested in making relevant changes, so hopefully we'll get this fixed and better documented on libbpf side

@danielocfb
Copy link
Collaborator

Thanks, seem the patch has landed upstream now.

@mendess mendess deleted the mendess/set-print-unsafe branch April 4, 2023 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants